This data set comes from Kaggle.com There are Building simulation results from for than 250k EnergyPlus simulations. With 48 columns of important data for using analytics energy consumption of building comsumtion following.
Universal Design Space Exploration
Design Space Exploration (DSE) analysis techniques represent a data-centric approach to integrating performance analysis in early design phases when there is the greatest potential to cheaply improve the energy efficiency of a building. We focus on a novel extension of DSE called Universal Design Space Exploration (UDSE), which leverages massive databases of pre-simulated analysis that represent all possible outcomes of common analysis workflows. These databases, called Design Spaces, become “universal” when a single pre-simulated design space can be re-applied to future, unknown projects. Unlike current simulation methods, which require a design to exist before it can be analyzed and often take minutes or hours to simulate, UDSE leverages pre-simulation to deliver rapid and relevant insight as new designs are conceptualized. The data underpinning UDSE enables advanced statistical and Artificial Intelligence methods, allowing UDSE to deliver a greater understanding of the larger problem being explored rather than simply delivering analysis of several pre-conceived design options. We believe that UDSE can provide instantaneous, relevant analysis for all building design projects at negligible cost.
AutoBEM
Oak Ridge National Laboratory has developed a collection of software and algorithms, collectively referred to as “Automatic Building Energy Modeling” (AutoBEM), which allows building energy modeling of each building at large geographic scales (AutoBEM). Within AutoBEM, building properties are detected, inferred, or predicted as inputs to generate building energy models using OpenStudio and simulate these buildings using EnergyPlus. OpenStudio is a collection of software tools to support energy modeling in EnergyPlus, which is a physical building energy simulation engine (OpenStudio) (EnergyPlus).
import pandas as pd
data0 = pd.read_csv("Universal_Design_Space_Building_Energy_Simulation_input_output.csv",low_memory=False)
data0.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 260748 entries, 0 to 260747 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 260748 non-null object 1 BuildingType 260748 non-null object 2 ClimateZone 260748 non-null object 3 TotalArea 260748 non-null int64 4 TotalArea_Setting 260748 non-null object 5 FloorArea 260748 non-null int64 6 FloorArea_Setting 260748 non-null object 7 NumFloors 260748 non-null int64 8 PlateDepth 260748 non-null int64 9 PlateDepth_Setting 260748 non-null object 10 PlateLength 260748 non-null int64 11 FloorHeight 260748 non-null int64 12 FloorHeight_Setting 260748 non-null object 13 Height 260748 non-null int64 14 WWR 260748 non-null float64 15 WWR_surfaces 260748 non-null object 16 SolarDesign 260748 non-null object 17 Standard 260748 non-null object 18 HVAC 260748 non-null object 19 HVAC_Setting 260748 non-null object 20 EnvelopeQuality_Setting 260748 non-null object 21 Wall_R_Value 260748 non-null float64 22 Roof_R_Value 260748 non-null float64 23 Glass_and_Frame_U_Value 260748 non-null float64 24 SHGC 260748 non-null float64 25 LPD_Adjustment 260748 non-null float64 26 LPD_Adjustment_Setting 260748 non-null object 27 Interior_Lights_Final_W_per_sf 260748 non-null float64 28 Exterior_Lights_Final_1_W 260748 non-null float64 29 Exterior_Lights_Final_2_W 260748 non-null float64 30 Setpoint_Setting 260748 non-null object 31 HeatingCoil 260748 non-null object 32 COP_Efficiency_Heating 260748 non-null object 33 CoolingCoil 260748 non-null object 34 COP_Efficiency_Cooling 260748 non-null float64 35 EUI_kBTU_per_sf 260748 non-null float64 36 Electricity_Facility_kBTU_per_sf 260748 non-null float64 37 NaturalGas_Facility_kBTU_per_sf 260748 non-null float64 38 Cooling_Electricity_kBTU_per_sf 260748 non-null float64 39 Heating_Electricity_kBTU_per_sf 260748 non-null float64 40 Heating_NaturalGas_kBTU_per_sf 260748 non-null float64 41 Heating_Total_kBTU_per_sf 260748 non-null float64 42 WaterSystems_Electricity_kBTU_per_sf 260748 non-null float64 43 Lighting_Electricity_kBTU_per_sf 260748 non-null float64 44 Equipment_Electricity_kBTU_per_sf 260748 non-null float64 45 Fans_Electricity_kBTU_per_sf 260748 non-null float64 46 Pumps_Electricity_kBTU_per_sf 260748 non-null float64 47 HeatRejection_Electricity_kBTU_per_sf 260748 non-null float64 48 HeatRecovery_Electricity_kBTU_per_sf 260748 non-null float64 dtypes: float64(24), int64(7), object(18) memory usage: 97.5+ MB
data = data0.copy()
data.describe()
| TotalArea | FloorArea | NumFloors | PlateDepth | PlateLength | FloorHeight | Height | WWR | Wall_R_Value | Roof_R_Value | ... | Heating_Electricity_kBTU_per_sf | Heating_NaturalGas_kBTU_per_sf | Heating_Total_kBTU_per_sf | WaterSystems_Electricity_kBTU_per_sf | Lighting_Electricity_kBTU_per_sf | Equipment_Electricity_kBTU_per_sf | Fans_Electricity_kBTU_per_sf | Pumps_Electricity_kBTU_per_sf | HeatRejection_Electricity_kBTU_per_sf | HeatRecovery_Electricity_kBTU_per_sf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 2.607480e+05 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | ... | 260748.000000 | 260748.000000 | 260748.000000 | 260748.0 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 | 260748.000000 |
| mean | 1.994595e+05 | 25846.604507 | 8.866783 | 97.235906 | 209.707273 | 14.602774 | 129.436559 | 59.377107 | 22.190043 | 41.705422 | ... | 0.537301 | 43.137857 | 43.675158 | 0.0 | 8.084788 | 38.817513 | 22.115252 | 2.509235 | 0.395789 | 4.833672 |
| std | 2.156952e+05 | 10350.688871 | 10.753368 | 25.367408 | 76.845830 | 1.970821 | 161.182146 | 17.593025 | 7.072464 | 14.076231 | ... | 1.728176 | 97.983243 | 97.762052 | 0.0 | 4.609707 | 15.903090 | 32.499733 | 4.598613 | 1.391958 | 10.479290 |
| min | 3.994400e+04 | 14519.000000 | 1.000000 | 45.000000 | 129.000000 | 10.000000 | 13.000000 | 25.000000 | 8.330000 | 21.340000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.648153 | 11.218701 | 0.180277 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 1.016330e+05 | 14971.000000 | 4.000000 | 75.000000 | 140.000000 | 13.000000 | 52.000000 | 50.390625 | 16.330000 | 31.340000 | ... | 0.000000 | 0.000000 | 0.232082 | 0.0 | 4.537323 | 24.806507 | 2.727917 | 0.016485 | 0.000000 | 0.000000 |
| 50% | 1.497120e+05 | 24998.000000 | 6.000000 | 98.000000 | 197.000000 | 15.000000 | 85.000000 | 61.742424 | 20.330000 | 31.340000 | ... | 0.000000 | 0.688570 | 1.768745 | 0.0 | 7.506889 | 41.608445 | 7.107739 | 1.186714 | 0.000000 | 0.084560 |
| 75% | 2.018510e+05 | 39539.000000 | 10.000000 | 122.000000 | 238.000000 | 16.000000 | 135.000000 | 71.228070 | 30.000000 | 60.000000 | ... | 0.071649 | 8.487569 | 9.333217 | 0.0 | 10.513514 | 52.364630 | 15.606273 | 2.963894 | 0.229866 | 1.419913 |
| max | 1.009256e+06 | 40434.000000 | 67.000000 | 153.000000 | 404.000000 | 18.000000 | 1072.000000 | 89.910314 | 30.000000 | 60.000000 | ... | 39.978431 | 583.250051 | 583.250051 | 0.0 | 19.497817 | 68.665014 | 103.441310 | 34.478568 | 15.186499 | 33.107724 |
8 rows × 31 columns
data.head()
| ID | BuildingType | ClimateZone | TotalArea | TotalArea_Setting | FloorArea | FloorArea_Setting | NumFloors | PlateDepth | PlateDepth_Setting | ... | Heating_Electricity_kBTU_per_sf | Heating_NaturalGas_kBTU_per_sf | Heating_Total_kBTU_per_sf | WaterSystems_Electricity_kBTU_per_sf | Lighting_Electricity_kBTU_per_sf | Equipment_Electricity_kBTU_per_sf | Fans_Electricity_kBTU_per_sf | Pumps_Electricity_kBTU_per_sf | HeatRejection_Electricity_kBTU_per_sf | HeatRecovery_Electricity_kBTU_per_sf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | low | 14542 | low | 7 | 122 | high | ... | 0.083671 | 0.0 | 0.083671 | 0.0 | 10.209886 | 44.423582 | 5.378116 | 4.748715 | 0.0 | 3.704229 |
| 1 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | low | 14542 | low | 7 | 122 | high | ... | 0.083671 | 0.0 | 0.083671 | 0.0 | 10.209886 | 44.423582 | 3.689256 | 4.675590 | 0.0 | 3.647120 |
| 2 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | low | 14542 | low | 7 | 122 | high | ... | 0.083856 | 0.0 | 0.083856 | 0.0 | 10.209886 | 44.423582 | 5.725152 | 4.350329 | 0.0 | 3.657521 |
| 3 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | low | 14542 | low | 7 | 122 | high | ... | 0.083856 | 0.0 | 0.083856 | 0.0 | 10.209886 | 44.423582 | 3.751486 | 4.224587 | 0.0 | 3.551803 |
| 4 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | low | 14542 | low | 7 | 122 | high | ... | 0.062975 | 0.0 | 0.062975 | 0.0 | 10.209886 | 44.423582 | 4.730901 | 4.131386 | 0.0 | 3.586181 |
5 rows × 49 columns
data.columns
Index(['ID', 'BuildingType', 'ClimateZone', 'TotalArea', 'TotalArea_Setting',
'FloorArea', 'FloorArea_Setting', 'NumFloors', 'PlateDepth',
'PlateDepth_Setting', 'PlateLength', 'FloorHeight',
'FloorHeight_Setting', 'Height', 'WWR', 'WWR_surfaces', 'SolarDesign',
'Standard', 'HVAC', 'HVAC_Setting', 'EnvelopeQuality_Setting',
'Wall_R_Value', 'Roof_R_Value', 'Glass_and_Frame_U_Value', 'SHGC',
'LPD_Adjustment', 'LPD_Adjustment_Setting',
'Interior_Lights_Final_W_per_sf', 'Exterior_Lights_Final_1_W',
'Exterior_Lights_Final_2_W', 'Setpoint_Setting', 'HeatingCoil',
'COP_Efficiency_Heating', 'CoolingCoil', 'COP_Efficiency_Cooling',
'EUI_kBTU_per_sf', 'Electricity_Facility_kBTU_per_sf',
'NaturalGas_Facility_kBTU_per_sf', 'Cooling_Electricity_kBTU_per_sf',
'Heating_Electricity_kBTU_per_sf', 'Heating_NaturalGas_kBTU_per_sf',
'Heating_Total_kBTU_per_sf', 'WaterSystems_Electricity_kBTU_per_sf',
'Lighting_Electricity_kBTU_per_sf', 'Equipment_Electricity_kBTU_per_sf',
'Fans_Electricity_kBTU_per_sf', 'Pumps_Electricity_kBTU_per_sf',
'HeatRejection_Electricity_kBTU_per_sf',
'HeatRecovery_Electricity_kBTU_per_sf'],
dtype='object')
# Check unique values
# col = [i for i in data.columns]
# for i in col:
# print(i,"---> \n",data[i].unique(),"\n")
data = data.drop(columns=["TotalArea_Setting","FloorArea_Setting","PlateDepth_Setting",
"FloorHeight_Setting","WWR_surfaces","Standard","HVAC",
"WaterSystems_Electricity_kBTU_per_sf","LPD_Adjustment",
"Setpoint_Setting"])
EDA: Exploratory Data Analysis
# Changing Dtype object to category to numeric
df2 = data.copy()
g = df2.columns.to_series().groupby(df2.dtypes).groups
a = {k.name: v for k, v in g.items()}
obls = []
flols = []
intls = []
for i in a['object']:
obls.append(i)
for i in a['float64']:
flols.append(i)
for i in a['int64']:
intls.append(i)
import matplotlib.pyplot as plt
plt.figure(figsize=(15,5))
data.boxplot()
plt.xticks(rotation = 90)
(array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
[Text(1, 0, 'TotalArea'),
Text(2, 0, 'FloorArea'),
Text(3, 0, 'NumFloors'),
Text(4, 0, 'PlateDepth'),
Text(5, 0, 'PlateLength'),
Text(6, 0, 'FloorHeight'),
Text(7, 0, 'Height'),
Text(8, 0, 'WWR'),
Text(9, 0, 'Wall_R_Value'),
Text(10, 0, 'Roof_R_Value'),
Text(11, 0, 'Glass_and_Frame_U_Value'),
Text(12, 0, 'SHGC'),
Text(13, 0, 'Interior_Lights_Final_W_per_sf'),
Text(14, 0, 'Exterior_Lights_Final_1_W'),
Text(15, 0, 'Exterior_Lights_Final_2_W'),
Text(16, 0, 'COP_Efficiency_Cooling'),
Text(17, 0, 'EUI_kBTU_per_sf'),
Text(18, 0, 'Electricity_Facility_kBTU_per_sf'),
Text(19, 0, 'NaturalGas_Facility_kBTU_per_sf'),
Text(20, 0, 'Cooling_Electricity_kBTU_per_sf'),
Text(21, 0, 'Heating_Electricity_kBTU_per_sf'),
Text(22, 0, 'Heating_NaturalGas_kBTU_per_sf'),
Text(23, 0, 'Heating_Total_kBTU_per_sf'),
Text(24, 0, 'Lighting_Electricity_kBTU_per_sf'),
Text(25, 0, 'Equipment_Electricity_kBTU_per_sf'),
Text(26, 0, 'Fans_Electricity_kBTU_per_sf'),
Text(27, 0, 'Pumps_Electricity_kBTU_per_sf'),
Text(28, 0, 'HeatRejection_Electricity_kBTU_per_sf'),
Text(29, 0, 'HeatRecovery_Electricity_kBTU_per_sf')])
plt.figure(figsize=(5,8))
data[['Exterior_Lights_Final_1_W','Exterior_Lights_Final_2_W']].boxplot()
plt.xticks(rotation = 45)
(array([1, 2]), [Text(1, 0, 'Exterior_Lights_Final_1_W'), Text(2, 0, 'Exterior_Lights_Final_2_W')])
import seaborn as sns
import matplotlib.pyplot as plt
for i in flols[5:10]:
fig, ax = plt.subplots(1, 2,figsize=(15,5))
ax[0].set_title(i)
ax[1].set_title('Distribution plot')
plt.show
sns.boxplot(ax=ax[0], data=data[i], orient='h')
sns.histplot(ax=ax[1], data=data[i])
Cut off outlier data that is below $3^{rd}$quartile ($25^{th}$percentile) and above $3^{rd}$quartile ($75^{th}$percentile). \ Ref: Interquartile Range
# import seaborn as sns
# import matplotlib.pyplot as plt
# for i in flols:
# fig, ax = plt.subplots(1, 2,figsize=(15,5))
# ax[0].set_title('Boxplot')
# ax[1].set_title('Distribution plot')
# sns.boxplot(ax=ax[0], data=data[i], orient='h')
# sns.histplot(ax=ax[1], data=data[i])
dfiqr = data.copy()
for i in intls:
dfiqr = dfiqr.drop(columns=[i])
for j in obls:
dfiqr = dfiqr.drop(columns=[j])
#q1,q3 =np.percentile(data,[25,75], axis=0)
tmp = dfiqr.quantile(([0.25,0.75]))
# print(tmp)
q1 = tmp.iloc[0,:]
q3 = tmp.iloc[1,:]
IQR = q3-q1
# print("IQR\n",IQR)
upper = q3+1.5*IQR
lower = q1-1.5*IQR
# print("The largest value in the data set \n",upper)
# print("The smallest value in the data set \n",lower)
# Find index to drop
result = dfiqr[((dfiqr>upper).any(axis = 1)) | ((dfiqr<lower).any(axis = 1))].index
result
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7,
8, 9,
...
260738, 260739, 260740, 260741, 260742, 260743, 260744, 260745,
260746, 260747],
dtype='int64', length=158695)
dfiqr["ID"] = data["ID"]
dfiqrnew = dfiqr.drop(index=result)
dfiqrnew.head()
| WWR | Wall_R_Value | Roof_R_Value | Glass_and_Frame_U_Value | SHGC | Interior_Lights_Final_W_per_sf | Exterior_Lights_Final_1_W | Exterior_Lights_Final_2_W | COP_Efficiency_Cooling | EUI_kBTU_per_sf | ... | Heating_Electricity_kBTU_per_sf | Heating_NaturalGas_kBTU_per_sf | Heating_Total_kBTU_per_sf | Lighting_Electricity_kBTU_per_sf | Equipment_Electricity_kBTU_per_sf | Fans_Electricity_kBTU_per_sf | Pumps_Electricity_kBTU_per_sf | HeatRejection_Electricity_kBTU_per_sf | HeatRecovery_Electricity_kBTU_per_sf | ID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 25.000000 | 30.00 | 60.00 | 0.20 | 0.25 | 1.0 | 441.80 | 8552.0 | 3.552525 | 94.998433 | ... | 0.050334 | 0.0 | 0.050334 | 10.209886 | 44.423582 | 3.296442 | 3.845008 | 0.0 | 3.337595 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 17 | 25.000000 | 30.00 | 60.00 | 0.20 | 0.25 | 1.0 | 441.80 | 8552.0 | 3.996591 | 89.394823 | ... | 0.040897 | 0.0 | 0.040897 | 10.209886 | 44.423582 | 3.296442 | 3.845008 | 0.0 | 3.337595 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 23 | 25.000000 | 30.00 | 60.00 | 0.20 | 0.25 | 1.0 | 441.80 | 8552.0 | 4.440657 | 83.791214 | ... | 0.031459 | 0.0 | 0.031459 | 10.209886 | 44.423582 | 3.296442 | 3.845008 | 0.0 | 3.337595 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 29 | 25.000000 | 8.33 | 21.34 | 0.66 | 0.25 | 0.4 | 176.72 | 3420.8 | 2.960438 | 94.982676 | ... | 0.135453 | 0.0 | 0.135453 | 4.083955 | 44.423582 | 3.185673 | 3.685125 | 0.0 | 3.213624 | College_1A_100000_14286_120_13_25_66_25_BestLi... |
| 31 | 74.152542 | 8.33 | 21.34 | 0.66 | 0.25 | 0.4 | 176.72 | 3420.8 | 3.552525 | 93.761766 | ... | 0.172216 | 0.0 | 0.172216 | 4.083955 | 44.423582 | 3.352672 | 4.427529 | 0.0 | 3.426734 | College_1A_100000_14286_120_13_25_66_25_BestLi... |
5 rows × 23 columns
dfiqrnew2 = dfiqrnew.copy()
for i in flols:
mu = dfiqrnew[i].mean()
sd = dfiqrnew[i].std()
# print(mu, sd)
max = dfiqrnew[i].max()
min = dfiqrnew[i].min()
# print(max, min)
# dfiqrnew[i] = dfiqrnew[i].apply(lambda x:(x-mu)/sd)
dfiqrnew2[i] = dfiqrnew[i].apply(lambda x:(x-min)/(max-min))
#df['size'] = df['size'].apply(lambda x:(x-mu)/sd if sd!=0 else x)
dfiqrnew2.head()
| WWR | Wall_R_Value | Roof_R_Value | Glass_and_Frame_U_Value | SHGC | Interior_Lights_Final_W_per_sf | Exterior_Lights_Final_1_W | Exterior_Lights_Final_2_W | COP_Efficiency_Cooling | EUI_kBTU_per_sf | ... | Heating_Electricity_kBTU_per_sf | Heating_NaturalGas_kBTU_per_sf | Heating_Total_kBTU_per_sf | Lighting_Electricity_kBTU_per_sf | Equipment_Electricity_kBTU_per_sf | Fans_Electricity_kBTU_per_sf | Pumps_Electricity_kBTU_per_sf | HeatRejection_Electricity_kBTU_per_sf | HeatRecovery_Electricity_kBTU_per_sf | ID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 0.000000 | 1.0 | 1.0 | 0.0 | 0.0 | 1.000000 | 0.034954 | 1.0 | 0.091653 | 0.582000 | ... | 0.281023 | 0.0 | 0.002372 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 17 | 0.000000 | 1.0 | 1.0 | 0.0 | 0.0 | 1.000000 | 0.034954 | 1.0 | 0.160393 | 0.537886 | ... | 0.228331 | 0.0 | 0.001927 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 23 | 0.000000 | 1.0 | 1.0 | 0.0 | 0.0 | 1.000000 | 0.034954 | 1.0 | 0.229133 | 0.493772 | ... | 0.175639 | 0.0 | 0.001483 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 | College_1A_100000_14286_120_13_25_20_25_BaseLi... |
| 29 | 0.000000 | 0.0 | 0.0 | 1.0 | 0.0 | 0.294118 | 0.010065 | 0.4 | 0.000000 | 0.581876 | ... | 0.756249 | 0.0 | 0.006384 | 0.232265 | 0.578016 | 0.083109 | 0.693458 | 0.0 | 0.905612 | College_1A_100000_14286_120_13_25_66_25_BestLi... |
| 31 | 0.757238 | 0.0 | 0.0 | 1.0 | 0.0 | 0.294118 | 0.010065 | 0.4 | 0.091653 | 0.572265 | ... | 0.961502 | 0.0 | 0.008116 | 0.232265 | 0.578016 | 0.087934 | 0.833161 | 0.0 | 0.965667 | College_1A_100000_14286_120_13_25_66_25_BestLi... |
5 rows × 23 columns
data2 = data.copy()
for i in flols:
data2 = data2.drop(columns=[i])
dfn = pd.merge(data2,dfiqrnew2, on = "ID",how = "inner")
dfn.head()
| ID | BuildingType | ClimateZone | TotalArea | FloorArea | NumFloors | PlateDepth | PlateLength | FloorHeight | Height | ... | Cooling_Electricity_kBTU_per_sf | Heating_Electricity_kBTU_per_sf | Heating_NaturalGas_kBTU_per_sf | Heating_Total_kBTU_per_sf | Lighting_Electricity_kBTU_per_sf | Equipment_Electricity_kBTU_per_sf | Fans_Electricity_kBTU_per_sf | Pumps_Electricity_kBTU_per_sf | HeatRejection_Electricity_kBTU_per_sf | HeatRecovery_Electricity_kBTU_per_sf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | 14542 | 7 | 122 | 350 | 13 | 91 | ... | 0.821321 | 0.281023 | 0.0 | 0.002372 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 |
| 1 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | 14542 | 7 | 122 | 350 | 13 | 91 | ... | 0.667235 | 0.228331 | 0.0 | 0.001927 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 |
| 2 | College_1A_100000_14286_120_13_25_20_25_BaseLi... | College | 1A | 101793 | 14542 | 7 | 122 | 350 | 13 | 91 | ... | 0.513149 | 0.175639 | 0.0 | 0.001483 | 0.646387 | 0.578016 | 0.086309 | 0.723544 | 0.0 | 0.940548 |
| 3 | College_1A_100000_14286_120_13_25_66_25_BestLi... | College | 1A | 101793 | 14542 | 7 | 122 | 350 | 13 | 91 | ... | 0.998144 | 0.756249 | 0.0 | 0.006384 | 0.232265 | 0.578016 | 0.083109 | 0.693458 | 0.0 | 0.905612 |
| 4 | College_1A_100000_14286_120_13_25_66_25_BestLi... | College | 1A | 101793 | 14542 | 7 | 122 | 350 | 13 | 91 | ... | 0.932584 | 0.961502 | 0.0 | 0.008116 | 0.232265 | 0.578016 | 0.087934 | 0.833161 | 0.0 | 0.965667 |
5 rows × 39 columns
# Convert Dtype object to category to numeric
df2 = dfn.copy()
for i in obls:
df2[i] = pd.Categorical(df2[i])
df2[i] = df2[i].cat.codes
# df2.info()
# Convert to float
from sklearn.preprocessing import LabelEncoder
col = df2.columns
le = LabelEncoder()
for i in col:
df2[i] = df2[i].astype('float')
df2.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 102053 entries, 0 to 102052 Data columns (total 39 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 102053 non-null float64 1 BuildingType 102053 non-null float64 2 ClimateZone 102053 non-null float64 3 TotalArea 102053 non-null float64 4 FloorArea 102053 non-null float64 5 NumFloors 102053 non-null float64 6 PlateDepth 102053 non-null float64 7 PlateLength 102053 non-null float64 8 FloorHeight 102053 non-null float64 9 Height 102053 non-null float64 10 SolarDesign 102053 non-null float64 11 HVAC_Setting 102053 non-null float64 12 EnvelopeQuality_Setting 102053 non-null float64 13 LPD_Adjustment_Setting 102053 non-null float64 14 HeatingCoil 102053 non-null float64 15 COP_Efficiency_Heating 102053 non-null float64 16 CoolingCoil 102053 non-null float64 17 WWR 102053 non-null float64 18 Wall_R_Value 102053 non-null float64 19 Roof_R_Value 102053 non-null float64 20 Glass_and_Frame_U_Value 102053 non-null float64 21 SHGC 102053 non-null float64 22 Interior_Lights_Final_W_per_sf 102053 non-null float64 23 Exterior_Lights_Final_1_W 102053 non-null float64 24 Exterior_Lights_Final_2_W 102053 non-null float64 25 COP_Efficiency_Cooling 102053 non-null float64 26 EUI_kBTU_per_sf 102053 non-null float64 27 Electricity_Facility_kBTU_per_sf 102053 non-null float64 28 NaturalGas_Facility_kBTU_per_sf 102053 non-null float64 29 Cooling_Electricity_kBTU_per_sf 102053 non-null float64 30 Heating_Electricity_kBTU_per_sf 102053 non-null float64 31 Heating_NaturalGas_kBTU_per_sf 102053 non-null float64 32 Heating_Total_kBTU_per_sf 102053 non-null float64 33 Lighting_Electricity_kBTU_per_sf 102053 non-null float64 34 Equipment_Electricity_kBTU_per_sf 102053 non-null float64 35 Fans_Electricity_kBTU_per_sf 102053 non-null float64 36 Pumps_Electricity_kBTU_per_sf 102053 non-null float64 37 HeatRejection_Electricity_kBTU_per_sf 102053 non-null float64 38 HeatRecovery_Electricity_kBTU_per_sf 102053 non-null float64 dtypes: float64(39) memory usage: 31.1 MB
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(24, 15))
data2 = df2.copy()
data2 = data2.iloc[:,1:27]
col = data2.columns
corr_sp = data2[col].corr(method='pearson')
# plot correlation
sns.heatmap(abs(corr_sp), annot=True, cmap="flare") # summer_r
plt.title('Correlation Heatmap', fontdict={'fontsize':24}, pad=12)
plt.show()
import xgboost as xgb
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
dfnew = df2.copy()
X, y = dfnew.iloc[:,1:26],dfnew.iloc[:,-13]
data_dmatrix = xgb.DMatrix(data=X,label=y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=999)
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.1,
max_depth = 5, alpha = 10, n_estimators = 10)
xg_reg.fit(X_train,y_train)
preds = xg_reg.predict(X_test)
# reverse score
# mu = dfiqrnew['EUI_kBTU_per_sf'].mean()
max1 = dfiqrnew['EUI_kBTU_per_sf'].max()
min1 = dfiqrnew['EUI_kBTU_per_sf'].min()
y_test_new = (y_test*(max1-min1)) + min1
preds_new = (preds*(max1-min1)) + min1
print("Mean Absolute Error: ",mean_absolute_error(y_test_new, preds_new))
Mean Absolute Error: 9.72474943313263
X_test2 = X_test.copy()
X_test2['PREDICT'] = preds_new
X_test2['REAL DATA'] = data['EUI_kBTU_per_sf']
X_test2['ERROR'] = abs(X_test2['PREDICT'] - X_test2['REAL DATA'])
X_test2.iloc[:5,-3:]
| PREDICT | REAL DATA | ERROR | |
|---|---|---|---|
| 36048 | 51.335945 | 68.263557 | 16.927612 |
| 94705 | 89.275932 | 70.950452 | 18.325480 |
| 21615 | 76.678230 | 64.576512 | 12.101718 |
| 1480 | 80.930763 | 84.433132 | 3.502369 |
| 92790 | 87.536217 | 59.044387 | 28.491830 |
MAE: Mean Absolute Error